Apresentação dos dados

Nesse laboratório utilizamos dados referentes aos alunos que já concluíram o curso de Ciência da Computação - UFCG. Nele se encontram, de cada aluno, todas as médias finais obtidas em cada disciplina (5-10), assim como o coeficiente de rendimento acadêmico (cra, ~4-10).

Queremos realizar análises de regressão utilizando as disciplinas dos dois primeiros períodos e cra na tentativa de responder a seguinte pergunta:

O desempenho dos alunos nos dois primeiros períodos consegue explicar, em algum grau, seus desempenhos no curso como um todo?

Para isso, construiremos um modelo de regressão com disciplinas do primeiro e segundo período. Ao longo desse documento iremos responder perguntas a respeito do modelo e realizar comparações. Logo abaixo um breve resumo de como se distribuem as variáveis quando relacionadas e seus respectivos coeficientes de correlação.

Legenda

graduados = read.csv("graduados_disciplinas.csv")

graduados = graduados[, c("Cálculo.Diferencial.e.Integral.I",  "Álgebra.Vetorial.e.Geometria.Analítica", "Leitura.e.Produção.de.Textos", "Programação.I", "Laboratório.de.Programação.I", "Introdução.à.Computação", "Cálculo.Diferencial.e.Integral.II", "Matemática.Discreta", "Programação.II", "Laboratório.de.Programação.II", "Teoria.dos.Grafos", "Fundamentos.de.Física.Clássica", "cra")]

colnames(graduados) = c("Cálculo.1", "Vetorial", "LPT", "P1", "LP1", "IC", "Cálculo.2", "Discreta", "P2", "LP2", "Grafos", "Física.3", "cra")

ggpairs(graduados, lower = list(continuous = "smooth"), upper = list(continuous = wrap("cor", size = 10)))

# ggcorr(graduados1[, 2:8], geom = "circle", nbreaks = 5)
# ggcorr(graduados1[, 2:8], nbreaks = 5,  label = TRUE, label_size = 3, label_round = 2, label_alpha = TRUE)

Um modelo de regressão múltipla com todas as variáveis é plausível para explicar a variação em y? Em que grau?

rl = lm(cra ~ ., data = graduados)

summary(rl)
## 
## Call:
## lm(formula = cra ~ ., data = graduados)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8273 -0.2988  0.1069  0.2796  1.0032 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.33894    0.59783   2.240  0.02758 *  
## Cálculo.1    0.02121    0.04907   0.432  0.66661    
## Vetorial     0.04443    0.04762   0.933  0.35327    
## LPT          0.09172    0.05167   1.775  0.07925 .  
## P1          -0.02593    0.07684  -0.337  0.73660    
## LP1         -0.02472    0.07450  -0.332  0.74082    
## IC           0.10196    0.08639   1.180  0.24098    
## Cálculo.2   -0.00100    0.05302  -0.019  0.98499    
## Discreta     0.23935    0.05863   4.083 9.63e-05 ***
## P2           0.29214    0.09553   3.058  0.00293 ** 
## LP2         -0.02848    0.06666  -0.427  0.67024    
## Grafos       0.09620    0.06302   1.526  0.13040    
## Física.3    -0.01024    0.06120  -0.167  0.86745    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5046 on 90 degrees of freedom
##   (309 observations deleted due to missingness)
## Multiple R-squared:  0.6889, Adjusted R-squared:  0.6474 
## F-statistic: 16.61 on 12 and 90 DF,  p-value: < 2.2e-16

Todas as variáveis são úteis para o modelo de regressão?

Se a resposta para a pergunta anterior foi não, construa um novo modelo sem essas variáveis e o compare ao modelo com todas as variáveis (e.g. em termos de R2 e RSE).

# 
# ggplot(graduados1, aes(graduados1$IC, graduados1$cra)) +  
#   geom_point(alpha = 0.1, position = position_jitter(width = 0.3), color="purple4") + 
#   labs(title="Previsão do modelo", x= "Nota em IC", y="CRA") +  
#   geom_line(aes(y = predict(rl1, graduados1)), colour = "red")

Analise os plots de resíduos de cada variável e veja se algum (um ou mais) deles indica não aleatoriedade dos erros.

Que período consegue explicar melhor o desempenho final (primeiro ou segundo)?

graduados1 = graduados[,c("Cálculo.1", "Vetorial", "LPT", "P1", "LP1", "IC", "cra")]
graduados2 = graduados[, c("Cálculo.2", "Discreta", "P2", "LP2", "Grafos", "Física.3", "cra")]

ggpairs(graduados1, upper = list(continuous = wrap("cor", size = 10)))
## Warning: Removed 112 rows containing non-finite values (stat_density).
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 118 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 118 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 114 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 114 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 114 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 112 rows containing missing values
## Warning: Removed 118 rows containing missing values (geom_point).
## Warning: Removed 103 rows containing non-finite values (stat_density).
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 105 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 105 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 105 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 108 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 103 rows containing missing values
## Warning: Removed 118 rows containing missing values (geom_point).
## Warning: Removed 105 rows containing missing values (geom_point).
## Warning: Removed 87 rows containing non-finite values (stat_density).
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 93 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 93 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 96 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 87 rows containing missing values
## Warning: Removed 114 rows containing missing values (geom_point).
## Warning: Removed 105 rows containing missing values (geom_point).
## Warning: Removed 93 rows containing missing values (geom_point).
## Warning: Removed 81 rows containing non-finite values (stat_density).
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 81 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 87 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 81 rows containing missing values
## Warning: Removed 114 rows containing missing values (geom_point).
## Warning: Removed 105 rows containing missing values (geom_point).
## Warning: Removed 93 rows containing missing values (geom_point).
## Warning: Removed 81 rows containing missing values (geom_point).
## Warning: Removed 79 rows containing non-finite values (stat_density).
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 87 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 79 rows containing missing values
## Warning: Removed 114 rows containing missing values (geom_point).
## Warning: Removed 108 rows containing missing values (geom_point).
## Warning: Removed 96 rows containing missing values (geom_point).
## Warning: Removed 87 rows containing missing values (geom_point).

## Warning: Removed 87 rows containing missing values (geom_point).
## Warning: Removed 86 rows containing non-finite values (stat_density).
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 86 rows containing missing values
## Warning: Removed 112 rows containing missing values (geom_point).
## Warning: Removed 103 rows containing missing values (geom_point).
## Warning: Removed 87 rows containing missing values (geom_point).
## Warning: Removed 81 rows containing missing values (geom_point).
## Warning: Removed 79 rows containing missing values (geom_point).
## Warning: Removed 86 rows containing missing values (geom_point).

ggpairs(graduados2, upper = list(continuous = wrap("cor", size = 10))) 
## Warning: Removed 290 rows containing non-finite values (stat_density).
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 299 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 295 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 295 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 298 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 295 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 290 rows containing missing values
## Warning: Removed 299 rows containing missing values (geom_point).
## Warning: Removed 94 rows containing non-finite values (stat_density).
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 95 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 95 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 95 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 102 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 94 rows containing missing values
## Warning: Removed 295 rows containing missing values (geom_point).
## Warning: Removed 95 rows containing missing values (geom_point).
## Warning: Removed 76 rows containing non-finite values (stat_density).
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 76 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 88 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 84 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 76 rows containing missing values
## Warning: Removed 295 rows containing missing values (geom_point).
## Warning: Removed 95 rows containing missing values (geom_point).
## Warning: Removed 76 rows containing missing values (geom_point).
## Warning: Removed 75 rows containing non-finite values (stat_density).
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 88 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 83 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 75 rows containing missing values
## Warning: Removed 298 rows containing missing values (geom_point).
## Warning: Removed 95 rows containing missing values (geom_point).
## Warning: Removed 88 rows containing missing values (geom_point).

## Warning: Removed 88 rows containing missing values (geom_point).
## Warning: Removed 88 rows containing non-finite values (stat_density).
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 95 rows containing missing values
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 88 rows containing missing values
## Warning: Removed 295 rows containing missing values (geom_point).
## Warning: Removed 102 rows containing missing values (geom_point).
## Warning: Removed 84 rows containing missing values (geom_point).
## Warning: Removed 83 rows containing missing values (geom_point).
## Warning: Removed 95 rows containing missing values (geom_point).
## Warning: Removed 80 rows containing non-finite values (stat_density).
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 80 rows containing missing values
## Warning: Removed 290 rows containing missing values (geom_point).
## Warning: Removed 94 rows containing missing values (geom_point).
## Warning: Removed 76 rows containing missing values (geom_point).
## Warning: Removed 75 rows containing missing values (geom_point).
## Warning: Removed 88 rows containing missing values (geom_point).
## Warning: Removed 80 rows containing missing values (geom_point).

graduados1 = na.omit(graduados1)
graduados2 = na.omit(graduados2)

rl1 = lm(cra ~ ., data = graduados1)
rl2 = lm(cra ~ ., data = graduados2)

summary(rl1)
## 
## Call:
## lm(formula = cra ~ ., data = graduados1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.53290 -0.31108  0.07564  0.37949  1.31148 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.89445    0.38666   2.313 0.021423 *  
## Cálculo.1    0.11604    0.02803   4.141 4.57e-05 ***
## Vetorial     0.12305    0.03221   3.821 0.000164 ***
## LPT          0.11915    0.03672   3.245 0.001315 ** 
## P1           0.06425    0.04076   1.576 0.116054    
## LP1          0.08620    0.04219   2.043 0.041974 *  
## IC           0.32061    0.04853   6.606 1.94e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5668 on 284 degrees of freedom
## Multiple R-squared:  0.5488, Adjusted R-squared:  0.5392 
## F-statistic: 57.56 on 6 and 284 DF,  p-value: < 2.2e-16
summary(rl2)
## 
## Call:
## lm(formula = cra ~ ., data = graduados2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.59303 -0.35328  0.08232  0.34269  1.01093 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.101453   0.488158   4.305 3.76e-05 ***
## Cálculo.2   0.018291   0.048695   0.376  0.70795    
## Discreta    0.266726   0.054252   4.916 3.27e-06 ***
## P2          0.257116   0.086630   2.968  0.00371 ** 
## LP2         0.004893   0.056411   0.087  0.93105    
## Grafos      0.129179   0.052604   2.456  0.01570 *  
## Física.3    0.039475   0.053422   0.739  0.46160    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5048 on 105 degrees of freedom
## Multiple R-squared:  0.6628, Adjusted R-squared:  0.6435 
## F-statistic:  34.4 on 6 and 105 DF,  p-value: < 2.2e-16

Use o modelo para predizer o seu próprio desempenho e compare a predição com o seu CRA atual. Comente o resultado.